General-Purpose Join Algorithms for Listing Triangles in Large Graphs

نویسنده

  • Daniel Zinn
چکیده

We investigate applying general-purpose join algorithms to the triangle listing problem in an out-of-core context. In particular, we focus on Leapfrog Triejoin (LFTJ) by Veldhuizen[36], a recently proposed, worst-case optimal algorithm. We present “boxing”: a novel, yet conceptually simple, approach for feeding input data to LFTJ. Our extensive analysis shows that this approach is I/O efficient, being worst-case optimal (in a certain sense). Furthermore, if input data is only a constant factor larger than the available memory, then a boxed LFTJ essentially maintains the CPU data-complexity of the vanilla LFTJ. Next, focusing on LFTJ applied to the triangle query, we show that for many graphs boxed LFTJ matches the I/O complexity of the recently by Hu, Tao and Yufei proposed specialized algorithm MGT [10] for listing tiangles in an out-of-core setting. We also strengthen the analysis of LFTJ’s computational complexity for the triangle query by considering families of input graphs that are characterized not only by the number of edges but also by a measure of their density. E.g., we show that LFTJ achieves a CPU complexity of O(|E| log |E|) for planar graphs, while on general graphs, no algorithm can be faster than O(|E|). Finally, we perform an experimental evaluation for the triangle listing problem confirming our theoretical results and showing the overall effectiveness of our approach. On all our real-world and synthetic data sets (some of which containing more than 1.2 billion edges) LFTJ in single-threaded mode is within a factor of 3 of the specialized MGT; a penalty that—as we demonstrate—can be alleviated by parallelization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Listing Triangles

We present new algorithms for listing triangles in dense and sparse graphs. The running time of our algorithm for dense graphs is Õ(n + nt), and the running time of the algorithm for sparse graphs is Õ(m + mt), where n is the number of vertices, m is the number of edges, t is the number of triangles to be listed, and ω < 2.373 is the exponent of fast matrix multiplication. With the current boun...

متن کامل

On Tensor Product of Graphs, Girth and Triangles

The purpose of this paper is to obtain a necessary and sufficient condition for the tensor product of two or more graphs to be connected, bipartite or eulerian. Also, we present a characterization of the duplicate graph $G 1 K_2$ to be unicyclic. Finally, the girth and the formula for computing the number of triangles in the tensor product of graphs are worked out.

متن کامل

Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study

In the past, the fundamental graph problem of triangle counting and listing has been studied intensively from a theoretical point of view. Recently, triangle counting has also become a widely used tool in network analysis. Due to the very large size of networks like the Internet, WWW, or social networks, the efficiency of algorithms for triangle counting and listing is an important issue. The m...

متن کامل

New algorithms for $k$-degenerate graphs

A graph is k-degenerate if any induced subgraph has a vertex of degree at most k. In this paper we prove new algorithms finding cliques and similar structures in these graphs. We design linear time Fixed-Parameter Tractable algorithms for induced and non induced bicliques. We prove an algorithm listing all maximal bicliques in time O(k(n−k)2), improving the result of [D. Eppstein, Arboricity an...

متن کامل

Distributed-Memory Parallel Algorithms for Counting and Listing Triangles in Big Graphs

Big graphs (networks) arising in numerous application areas pose significant challenges for graph analysts as these graphs grow to billions of nodes and edges and are prohibitively large to fit in the main memory. Finding the number of triangles in a graph is an important problem in the mining and analysis of graphs. In this paper, we present two efficient MPI-based distributed memory parallel ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1501.06689  شماره 

صفحات  -

تاریخ انتشار 2015